factored representation
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
Look Ma, No Hands!
The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos
The analysis and use of egocentric videos for robotics tasks is made challenging by occlusion and the visual mismatch between the human hand and a robot end-effector. Past work views the human hand as a nuisance and removes it from the scene. However, the hand also provides a valuable signal for learning. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment.
Look Ma No Hands
The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos
The analysis and use of egocentric videos for robotics tasks is made challenging by occlusion and the visual mismatch between the human hand and a robot end-effector. Past work views the human hand as a nuisance and removes it from the scene. However, the hand also provides a valuable signal for learning. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention).
Curricula for Learning Robust Policies with Factored State Representations in Changing Environments
Panayiotou, Panayiotis, Şimşek, Özgür
Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample efficiency in policy learning. In this paper, we explore how the curriculum of an agent using a factored state representation affects the robustness of the learned policy. We experimentally demonstrate three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.
- Research Report (0.64)
- Instructional Material > Course Syllabus & Notes (0.51)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos
Chang, Matthew, Prakash, Aditya, Gupta, Saurabh
The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. This alleviates both occlusion and mismatch while preserving the signal, thereby easing the design of models for downstream robotics tasks. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention). Our experiments demonstrate the effectiveness of VIDM at improving inpainting quality on egocentric videos and the power of our factored representation for numerous tasks: object detection, 3D reconstruction of manipulated objects, and learning of reward functions, policies, and affordances from videos.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
New Developments in Human-Computer Interaction part1
Abstract: We present a novel, web-based visual eye-tracking analytics tool called Gazealytics. Our open-source toolkit features a unified combination of gaze analytics features that support flexible exploratory analysis, along with annotation of areas of interest (AOI) and filter options based on multiple criteria to visually analyse eye tracking data across time and space. Gazealytics features coordinated views unifying spatiotemporal exploration of fixations and scanpaths for various analytical tasks. A novel matrix representation allows analysis of relationships between such spatial or temporal features. Data can be grouped across samples, user-defined areas of interest (AOIs) or time windows of interest (TWIs) to support aggregate or filtered analysis of gaze activity.
Learning in Factored Domains with Information-Constrained Visual Representations
Malloy, Tyler, Liu, Miao, Riemer, Matthew D., Klinger, Tim, Tesauro, Gerald, Sims, Chris R.
Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $\beta$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)